DigitHist: a Histogram-Based Data Summary with Tight Error Bounds
نویسندگان
چکیده
We propose DigitHist, a histogram summary for selectivity estimation on multi-dimensional data with tight error bounds. By combining multi-dimensional and one-dimensional histograms along regular grids of different resolutions, DigitHist provides an accurate and reliable histogram approach for multi-dimensional data. To achieve a compact summary, we use a sparse representation combined with a novel histogram compression technique that chooses a higher resolution in dense regions and a lower resolution elsewhere. For the construction of DigitHist, we propose a new error measure, termed u-error, which minimizes the width between the guaranteed upper and lower bounds of the selectivity estimate. The construction algorithm performs a single data scan and has linear time complexity. An in-depth experimental evaluation shows that DigitHist delivers superior precision and error bounds than state-of-the-art competitors at a comparable query time.
منابع مشابه
Robust Identification of Smart Foam Using Set Mem-bership Estimation in A Model Error Modeling Frame-work
The aim of this paper is robust identification of smart foam, as an electroacoustic transducer, considering unmodeled dynamics due to nonlinearities in behaviour at low frequencies and measurement noise at high frequencies as existent uncertainties. Set membership estimation combined with model error modelling technique is used where the approach is based on worst case scenario with unknown but...
متن کاملSteganalysis Method for LSB Replacement Based on Local Gradient of Image Histogram
In this paper we present a new accurate steganalysis method for the LSBreplacement steganography. The suggested method is based on the changes that occur in thehistogram of an image after the embedding of data. Every pair of neighboring bins of ahistogram are either inter-related or unrelated depending on whether embedding of a bit ofdata in the image could affect both bins or not. We show that...
متن کاملGPS/INS Integration for Vehicle Navigation based on INS Error Analysis in Kalman Filtering
The Global Positioning System (GPS) and an Inertial Navigation System (INS) are two basic navigation systems. Due to their complementary characters in many aspects, a GPS/INS integrated navigation system has been a hot research topic in the recent decade. The Micro Electrical Mechanical Sensors (MEMS) successfully solved the problems of price, size and weight with the traditional INS. Therefore...
متن کاملGeneralization Error Bounds Using Unlabeled Data
We present two new methods for obtaining generalization error bounds in a semi-supervised setting. Both methods are based on approximating the disagreement probability of pairs of classifiers using unlabeled data. The first method works in the realizable case. It suggests how the ERM principle can be refined using unlabeled data and has provable optimality guarantees when the number of unlabele...
متن کاملEstimation of mutual information by the fuzzy histogram
Abstract Mutual Information (MI) is an important dependency measure between random variables, due to its tight connection with information theory. It has numerous applications, both in theory and practice. However, when employed in practice, it is often necessary to estimate the MI from available data. There are several methods to approximate the MI, but arguably one of the simplest and most wi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 10 شماره
صفحات -
تاریخ انتشار 2017